Mixed-membership models of scientific publications.

نویسندگان

  • Elena Erosheva
  • Stephen Fienberg
  • John Lafferty
چکیده

PNAS is one of world's most cited multidisciplinary scientific journals. The PNAS official classification structure of subjects is reflected in topic labels submitted by the authors of articles, largely related to traditionally established disciplines. These include broad field classifications into physical sciences, biological sciences, social sciences, and further subtopic classifications within the fields. Focusing on biological sciences, we explore an internal soft-classification structure of articles based only on semantic decompositions of abstracts and bibliographies and compare it with the formal discipline classifications. Our model assumes that there is a fixed number of internal categories, each characterized by multinomial distributions over words (in abstracts) and references (in bibliographies). Soft classification for each article is based on proportions of the article's content coming from each category. We discuss the appropriateness of the model for the PNAS database as well as other features of the data relevant to soft classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Discovery of Latent Patterns with Hierarchical Bayesian Mixed-Membership Models and the Issue of Model Choice

Model choice is a major methodological issue in the explosive growth of data-mining models involving latent structure for clustering and classification, especially because models often have different parameterizations and very different specifications and constraints. Here, we work from a general formulation of hierarchical Bayesian mixed-membership models and present several model specificatio...

متن کامل

Hierarchical Bayesian Mixed-Membership Models and Latent Pattern Discovery

Hierarchical Bayesian methods expanded markedly with the introduction of MCMC computation in the 1980s, and this was followed by the explosive growth of machine learning tools involving latent structure for clustering and classification. Nonetheless, model choice remains a major methodological issue, largely because competing models used in machine learning often have different parameterization...

متن کامل

Discovering Latent Patterns with Hierarchical Bayesian Mixed-Membership Models

There has been an explosive growth of data-mining models involving latent structure for clustering and classification. While having related objectives these models use different parameterizations and often very different specifications and constraints. Model choice is thus a major methodological issue and a crucial practical one for applications. In this paper, we work from a general formulatio...

متن کامل

Introduction to Mixed Membership Models and Methods

1.1 Historical Developments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 A General Formulation for Mixed Membership Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Advantages of Mixed Membership Models in Applied Statistics . . . . . . . . . . . . . . . . . . ....

متن کامل

Bayesian Mixed Membership Models for Soft Classification

The paper describes and applies a fully Bayesian approach to soft classification using mixed membership models. Our model structure has assumptions on four levels: population, subject, latent variable, and sampling scheme. Population level assumptions describe the general structure of the population that is common to all subjects. Subject level assumptions specify the distribution of observable...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings of the National Academy of Sciences of the United States of America

دوره 101 Suppl 1  شماره 

صفحات  -

تاریخ انتشار 2004